Improving cache performance by runtime data movement
نویسنده
چکیده
The performance of a recursive data structure (RDS) increasingly depends on good data cache behaviour, which may be improved by software/hardware prefetching or by ensuring that the RDS has a good data layout. The latter is harder but more effective, and requires solving two separate problems: firstly ensuring that new RDS nodes are allocated in a good location in memory, and secondly preventing a degradation in layout when the RDS changes shape due to pointer updates. The first problem has been studied in detail, but only two major classes of solutions to the second exist. Layout degradation may be side-stepped by using a ‘cache-aware’ RDS, one designed to have inherently good cache behaviour (e.g. using a B-Tree in place of a binary search tree), but such structures are difficult to devise and implement. A more automatic solution in some languages is to use a ‘layout-improving’ garbage collector, which attempt to improve heap data layout during collection using online profiling of data access patterns. This may carry large performance, memory and latency overheads. In this thesis we investigate the insertion of code into a program which attempts to move RDS nodes at runtime to prevent or reduce layout degradation. Such code affects only the performance of a program not its semantics. The body of this thesis is a thorough and systematic evaluation of three different forms of data movement. The first method adapts existing work on static RDS data layout, performing ad-hoc single node movements at a program’s pointer-update sites, which is simple to apply and effective in practice, but the performance gain may be hard to predict. The second method performs infrequent movement of larger groups of nodes, borrowing techniques from garbage collection but also embedding data movement in existing traversals of the RDS; the benefit of performing additional data movement to compact the heap is also demonstrated. The third method restores a pre-chosen layout after each RDS pointer update, which is a complex but effective technique, and may be viewed both as an optimisation and as a way of synthesising new cache-aware RDSs. Concentrating on both maximising performance while minimising latency and extra memory usage, two fundamental RDSs are used for the investigation, representative of two common data access patterns (linear and branching). The methods of this thesis compare favourably to upper bounds on performance and to the canonical cache-aware solutions. This thesis shows the value of runtime data movement, and as well as producing optimisation useful in their own right may be used to guide the design of future cacheaware RDSs and layout-improving garbage collectors.
منابع مشابه
Improving Cache Effectiveness through Array Data Layout Manipulation in SAC
Sac is a functional array processing language particularly designed with numerical applications in mind. In this field the runtime performance of programs critically depends on the efficient utilization of the memory hierarchy. Cache conflicts due to limited set associativity are one relevant source of inefficiency. This paper describes the realization of an optimization technique which aims at...
متن کاملExtended Abstract: Convex Partitioning of Large-Scale Directed Graphs
Directed graphs play an important role in combinatorial scientific computing (CSC) due to their modelling power for algorithms, workflow execution and communication patterns. However, when modelling a CSC problem as a graph partitioning problem, the direction information is generally ignored. There are several problems where the directionality is crucial: for instance, if out-of-core execution ...
متن کاملRuntime-Assisted Shared Cache Insertion Policies Based on Re-reference Intervals
Processor speed is improving at a faster rate than the speed of main memory, which makes memory accesses increasingly expensive. One way to solve this problem is to reduce miss ratio of the processor’s last level cache by improving its replacement policy. We approach the problem by co-designing the runtime system and hardware and exploiting the semantics of the applications written in data-flow...
متن کاملTowards modeling a complex geological simulation
Data motion is a significant factor affecting runtime performance. Data-intensive applications are subject to the effects of data motion more so than other applications. This research uses abstract machine models to calculate runtime performance expectations for a geological simulation program. The models are based on the time to execute double-precision floating-point instructions and the time...
متن کاملTo Ann - Sofie
To reduce latency and increase bandwidth to memory, modern microprocessors are designed with deep memory hierarchies including several levels of caches. For such microprocessors, the service time for fetching data from off-chip memory is about two orders of magnitude longer than fetching data from the level-one cache. Consequently, the performance of applications is largely determined by how we...
متن کامل